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Abstract 

Infectious disease remains, despite centuries of work to control and mitigate its effects, a 
major problem facing humanity. This paper reviews the mathematical modelling of infectious 
disease epidemics on networks, starting from the simplest Erdos-Renyi random graphs, and 
building up structure in the form of correlations, heterogeneity and preference, paying partic- 
ular attention to the links between random graph theory, percolation and dynamical systems 
representing transmission. Finally, the problems posed by networks with a large number of 
short closed looks are discussed. 

1 Introduction 

In science (and particularly theoretical science) it is necessary to approximate to make progress. 
Such approximations can be non-rigorously categorised into three types. 

Type I Approximations that cut out the 'unnecessary' complexity in a system to yield an appropriate 
mathematical representation. For example, it is not necessary to worry about the weak 
nuclear force when you are modelling a bungee jump; gravity and the elasticity of the rope 
are all that is needed. 

Type II Approximations that are controlled up to a power of some small quantity, i.e. "Corrections 
to this result will be 0(s). v 

Type III Approximations of mathematical convenience, made so that a system can be analysed but 
with little other motivation. 

As a former physicist now working in biology, I believe that the reason physics has been so successful 
in developing theory is that physical experimental systems can often be modelled by making Type 
I and Type II approximations. In biology, medicine, and sociology, it is much more frequent that 
Type III approximations are made. 

Despite this difficulty in making well motivated simplifying assumptions, modern biology is one 
of the most exciting areas of science, with enormous quantities of interesting experimental data 
that pose major unanswered scientific questions. In some cases, laboratory techniques are able 
to make relatively precise measurements - although these are nowhere near the accuracy of, say, 
atomic physics - but in other cases even repeatable experiments are not possible. This is similar 
to what happens in cosmology, where we only see one realisation of the universe. Cosmological 
models are, of course, highly informative and useful, but they are not as accurate as the incredible 
agreement between theory and data on the spectrum of atomic hydrogen. 

Epidemiology is the study of patterns of disease in populations. It started out as the study of 
infectious agents (the topic of this review) but has grown to encompass the diseases of lifestyle and 
affluence like cancer and heart disease. Sometimes controlled experiments infect human volunteers 
with mild illnesses, or cause more severe disease in non-human animals, but since infectious diseases 
can never be ethically released into non-laboratory populations, infectious disease epidemiology as 
defined is an observational science, like cosmology. While life-threatening infections are now largely 
under control in rich countries, they kill millions, including a large proportion of children under 
5, in the developing world [HJ Fig. 5]. One of the global problems we face in the 21st century is 
the unequal distribution of calorie intake. In the US and EU, obesity is driving epidemics of heart 
disease and diabetes; but malnutrition is a large part of the reason that infectious diseases are so 
deadly for poor children [25] . 
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While we wait for a political solution to the fundamental problem of global inequality, scientists 
can do several things to help limit the human cost of infectious diseases. Laboratory biologists and 
clinicians can develop and test new treatments, vaccines and behavioural interventions. But given 
the possibility of deploying such interventions, there will always be the question of how to optimise 
their use. Epidemiologists have therefore got two scientific tasks: first, to identify the key routes of 
disease transmission; and secondly, to design optimal interventions making use of that knowledge. 

The aim of this review is to introduce some of the mathematics used in modern infectious 
disease epidemiology, in particular the sub-discipline of modelling epidemics on networks, where 
tools from statistical physics are increasingly used. In contrast to much of physics, there is very 
little consensus amongst researchers about many important issues in epidemic modelling. This 
means that a review must either be highly technical, so that a reader can make up their own mind, 
or somewhat subjective. I have adopted the latter approach, trying to signpost clearly where a 
statement is a personal opinion. Also, key biological insights obtained from these mathematical 
techniques are highlighted at appropriate points. 



2 The SIR model 

2.1 Infection dynamics 

Suppose we have a 'closed population' - i.e. a large number of individuals, with no births or deaths 
during the period of time modelled. To motivate the model most commonly considered in epidemic 
models, three approximations are made. 

1. The compartmental approximation: Each individual is either susceptible to infection, in- 
fectious with the disease, or recovered and immune. Write S(t) for the proportion of the 
population that is susceptible, I(t) for the proportion that is infections, and R(t) for the 
proportion that is recovered. 

2. The mass-action approximation: Infection happens between each susceptible and infectious 
individual in the population at a constant rate /3. 

3. The Markovian approximation: Recovery from infection is a Markovian (memory-less) pro- 
cess, and happens at a constant rate 7. 

These assumptions, lead to the SIR (susceptible-infectious-recovered) equations below in the limit 
as population size becomes extremely large. 
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It is worth thinking briefly about the approximations introduced. I would argue that, at least for 
some diseases, the compartmental assumption is Type I - it helps us to get our thinking about the 
problem straight, even if there is a more detailed microscopic story about microbes, white blood 
cells and antibodies. Putting people into discrete compartments is what empirical epidemiologists 
often do, counting present and former cases rather than trying to determine where all the viruses 
and bacteria are. In contrast, the Markovian approximation is definitely Type III. There is no good 
reason to think that recovery from illness has no memory and is as likely one hour after infection 
as it is a week after. But Markovian dynamics remain popular for two reasons: many results are 
not sensitive to this assumption; and it does massively simplify epidemic modelling. Finally, the 
mass-action assumption actually works well for small, well-connected populations like boarding 
schools [T71 Fig. 2.4]; but actually the equations ((TJ assume that the population is extremely large. 
I would therefore categorise mass action, alongside the Markovian approximation, as something 
that is assumed for mathematical convenience. 
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Figure 1: Comparison of the SIR model 
(solid red line, with 95% CI as dashed 
red lines) with death data (black line 
with circles) for the main wave of the 
1918-19 influenza pandemic in England 
and Wales. 



Biological insight 

Despite its simplicity, the SIR model fits many epidemics well. Figure [T] shows what happens when 
this model is fitted to the main wave of the 1918-19 pandemic in England and Wales. Clearly, there 
are features of the data that are not captured by the model, but it still acts as a good 'starting 
point' for understanding epidemics of infectious disease. 

2.2 Early behaviour of an epidemic 

Network theory enters epidemiology as an attempt to relax the assumption of mass action; before 
turning to this, let us analyse equations ([T]) using the theory of dynamical systems. The first thing 
to note about them is that they are conservative, i.e. the quantity S + 1 + R is invariant over time. 
This follows from the assumption that there are no births and deaths in the population, and means 
that we only need to specify two initial conditions to integrate the system: 5(0) and 1(0) such that 
S(0) + 1(0) < 1. If we start out with 1(0) <C 1, then the equations ([T]) can be linearised to give 



Therefore, if a small amount of infection is introduced into a population, we will see initial expo- 
nential growth if f3S(0) > 7, and a decline in the number of infectious individuals otherwise. 

2.3 The basic reproductive ratio 

A quantity that is often defined in epidemic models is the Basic reproductive ratio, Rq (distinct from 
the initial proportion of the population in the recovered group, R(0), in an unfortunate but fixed 
notational convention). This is defined verbally as the expected number of additional infectious 
individuals produced by a typical infectious individual early in the epidemic. By simple logical 
argument, this quantity must exceed unity for an epidemic to grow, since otherwise each infected 
fails to produce, on average, more than one new infected before they recover. For the SIR model 
above, we can simply write down the basic reproductive ratio: 
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Clearly, the verbal argument that this quantity should exceed unity for an epidemic to take off 
agrees with the dynamical argument made above about the linearised system (J2J). Rq is widely 
regarded as one of the most important contributions of mathematical analysis to infectious disease 



epidemiology, and can be defined for many different epidemic models [10] . On a general network, 
the appropriate definition of this quantity becomes more difficult; instead it is easier to focus on 
early behaviour, as analysed above, and epidemic final size. 



2.4 Final size and vaccination 

Now let us manipulate the SIR model (p}, dividing the first equation by the third to give 



This relationship allows us to derive results about the final outcome of an epidemic. Before doing 
this, let us consider the impact of vaccination on the infection. Conceptually, there are two kinds 
of vaccination that represent extreme limits within which biological reality falls. The first of these 
is leaky vaccination, which reduces the susceptibility of every individual in the population. This 
is modelled by scaling the transmission rate j3 —y s/3 for e between and 1. The second kind 
of vaccination is called all-or-nothing vaccination, in which a proportion py of the population 
is vaccinated and completely immune to disease. This is modelled by taking initial conditions 
S*(0) = 1 — pv — 1(0) *C 1, R(0) — pv- In reality, vaccines can provide comprehensive 

immunity in some individuals and partial immunity in others, while coverage never reaches 100%; 
the concepts of leaky and all-or-nothing vaccination are therefore best seen as limiting cases. 

Having included the effects of vaccination, we can then use the result ((3]) to calculate the value 
R(oo) — pv, which is the proportion of a population experiencing disease during an epidemic, a 
quantity often called the attack rate by epidemiologists. Note that medics use the word 'rate' 
incorrectly to mean a proportion or percentage, rather than something with units of time -1 . They 
are not going to stop doing this, so it is just something quantitative scientists have to live with. 
The attack rates calculated are shown in Figure Ufa). 

Biological insight 

There are several interesting features of Figure HJa), but three are particularly worth highlighting: 

1. A finite proportion of the population experiences disease if and only if Rq > 1. Below this 
threshold the final size is zero. 

2. Ro does not need to be much larger than 1 to generate an extremely large epidemic. The 
gradient of the curves is steep for Rq just over 1. 

3. Regardless of how large the transmission rate is, the final size is always strictly less than 
5(0). Epidemics end because they run out of infectious individuals, not because they run 
out of susceptibles. 

We now turn to the general theory of networks, before looking for parallels between these and 
epidemics. 

3 Network theory 
3.1 Fundamental concepts 

A network is made up of two objects: nodes, and links. These are abstract terms for the individuals 
we want to consider, and the relationships between them respectively. In the current context, these 
could be individuals who can be infected and contacts that can lead to the transmission of disease. 
Suppose we have N nodes labelled by indices i,j = 1, . . . , N. Then a common way to represent 
network structure is through the adjacency matrix G = (Gy ) , where 
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s => s(t) = <?(o) e (*(°)-*«)^ . 
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Figure 2: (a) shows the final size of an SIR epidemic. Note that for this model, 
Ro = e/3S(0)/j. (b) shows the giant component size of an ER random graph - 
spot the difference! 



It is possible to define generalisations of this, where the matrix is not equal to its transpose (leading 
to an asymmetric network) or takes general values (leading to a weighted network) . There is also 
the question of whether a node can be connected to itself; but in the context of infectious disease, 
it makes most sense to assume that Ga = so that nodes do not link to themselves. 

There are many different properties of a network that can be defined, and a recent textbook 
summarises these quite comprehensively |22j . Perhaps the most fundamental, however, is the 
notion of a node's degree. The degree of node i is 



h = J2 G * 



(6) 



We will also write Nk for the number of nodes of degree k, so that dk — Nk/N is a discrete 
distribution known as the network's degree distribution. 

Another particularly important concept for epidemic networks is that of a component - a set of 
nodes for which any pair is linked to each other through a finite-length path through the network. 
By labelling the nodes correctly, it is possible to write the adjacency matrix in block diagonal form: 
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so that G(c) is the adjacency matrix for component C. There is clearly a qualitative difference 
between a network in which a significant number of the nodes are in one component, and a network 
made up of many small components. We will now turn to how networks can move between one 
regime and the other. 



3.2 Erdos-Renyi random graphs 

Stochastic processes that produce networks are called random graph models. The word 'graph' is 
essentially synonymous with the word 'network' in this context, although some authors do make a 
distinction. These models are useful for a variety of reasons. It may be that the family of networks 
produced by a random graph model has interesting properties; or the random graph model might 
be used as a null model - i.e. something to test against real data - in statistical work. 

The Erdos-Renyi (ER) random graph model involves taking N individuals, and putting a link 
between each of the iV(iV — l)/2 pairs of individuals with independent probability it. While a 
highly mathematical treatment of this model is possible |Ilj . we will argue heuristically here. Of 
particular interest is the size of components the network produced. The largest component in a 
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network is called the giant component; the key qualitative difference of network types is whether 
the giant component size as a proportion of the nodes S tends to as N — >■ oo, or whether it tends 
to some finite value between and 1. 

Let us suppose we are in the latter situation, and pick a random node in the graph. The 
probability that this node is not in the giant component is x, which is the same for all nodes since 
they are not differentiated and we have picked randomly. Now consider all other nodes in the 
network - if the initial node is not in the giant component, then they must be either not connected 
to the initial node, or connected to the initial node and not in the giant component themselves. 
We can write this statement mathematically as 

x= {{1 - it) + iTxf' 1 , (8) 

which is a polynomial in x with no simple analytic solution. As N increases, even numerical 
solution of (|SJ) becomes difficult, and it is necessary to take the limit N — > oo, holding constant 
the mean number of contacts per node c = (JV — 1)tt, so that 

x = e^-^ (9) 

is the appropriate equation for the probability that a node is not in the giant component of a very 
large ER graph. Already, the similarity between this expression and (U]) should be clear, but the 
analogy can be made still stronger by consideration of a slightly more general model. Before doing 
this, note that c is the mean node degree, and the network's degree distribution (as defined in ij3.ll 
above) will be Poisson with parameter c in the limit N — > oo. 

One important qualitative feature of the ER random graph model is that it undergoes a phase 
transition at c = 1. Below this critical value, the equation ((9]) does not have a solution such 
that < x < 1 and there is no sizeable giant component - the network is made of lots of small 
components. For c > 1, one component dominates meaning that a disease spreading around the 
population can reach a significant proportion of individuals. 

3.3 Percolation on graphs and epidemics 

Percolation is a standard tool in statistical physics [25j : in the context of networks, there are two 
ways that this method is used to modify an existing network. In site percolation, each original node 
is present in the modified network with independent probability p s (i.e. we remove a proportion 
1 — p s of nodes at random). In bond percolation, each original link is present in the modified network 
with independent probability pb (i.e. we remove a proportion 1 — pb of links at random). 

Suppose we have applied both node and link removals to an ER random graph, and make the 
same argument about picking a node at random. If the node is not in the giant component, then 
cither it is not present following site percolation, or else all other nodes in the network that survive 
site percolation must either: (i) not be linked in the original ER graph, or be linked in the original 
ER graph and have the link deleted during bond percolation; or (ii) be linked in the original ER 
graph, have the link remain during bond percolation, and not be in the giant component. After 
some mathematical manipulations along the lines of those used to derive |9]), taking the N — > oo 
limit for constant c gives 

x = (1 -p s ) + p s eS x+1 - 2p ^» . (10) 

Then the giant component size is given by £ = 1 — x, once we have solved for x. Figure[U(b) shows 
the results of doing this; having gone through all the maths above, it is not surprising both plots 
in Figure [2] are identical since the equations used to generate them are mathematically isomorphic. 
In fact we can write this equivalence out explicitly as shown in Table Q] But if you had seen the 
SIR equations fT]). and heard a description of both the ER random graph model and percolation, 
would this equivalence be obvious? 

I do not believe it would. In fact, the relationship between differential equations that describe 
dynamical processes over time and static models based on probabilities is quite subtle. For example, 
some infections do not lead to long-lasting immunity and after recovery individuals can become 
susceptible again. For such infections, there is no simple process of deletions or nodes and links 
that gives the potential of the epidemic to take off or the final impact of the epidemic. Similarly, 
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Table 1: Parallels between SIR epidemics and ER random graphs 
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Figure 3: (a) shows a regular graph in which every Node has degree 2. (b) shows a 
heterogeneous network where some nodes are High-degree with 3 links and others are 
Low-degree with 1 link, but there are connections between these two types of node, 
(c) shows a highly degree-assortative heterogeneous network where high-degree and low- 
degree nodes connect only to other nodes of the same type. 



even for the simpler SIR case where immunity to further infection is long-lasting, site- and bond 
percolation do not work as calculational tools where short, closed loops are present in the network 
in appreciable numbers. 

But where percolation works, it is very useful, since there are few other analytic approaches 
to epidemics on networks. Generally applicable Monte-Carlo methods, where a computer picks 
random numbers to simulate the epidemic process, can be highly computationally intensive. 

4 Correlation, heterogeneity and preference in epidemic net- 
works 

We now move on to networks that contain more structure than ER random graphs. Figure [3] 
shows three kinds of network: (a) a 2-regular graph in which every node participates in two links; 
(b) a network with finite-variance degree distribution and no preference for nodes to link to other 
nodes of a similar degree; and (c) a network with finite-variance degree distribution and a strong 
preference for nodes to link to other nodes of a similar degree. We will consider each of these types 
of network in turn. 

4.1 Regular graphs 

The ER random graph model is something of a special case, since every link's presence (or absence) 
is an independent chance event. One of the simplest ways to introduce correlations between links 
is to constrain the node degree so that every node has constant degree n. Such random graphs 
are called n-regular, and can be constructed in several ways; however it is clear that the regularity 
condition means that each link's presence is not independent of others'. A node that already has n 
links cannot participate in any more, while a node with links must be allocated a further n. We 
consider algorithms for dealing with this later, but for now let us think about the giant component 
size for such graphs. 
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4.1.1 Giant component size and percolation 

In order to consider this, we make an argument much like that made for ER graphs above. Let x 
be the probability that an individual in an n-regular graph constructed in such a way as to avoid 
the presence of short loops is not in the giant component. Then each of its n neighbours must also 
not be in the giant component. Writing x for the probability that these neighbours are themselves 
not in the giant component, we can write 

x = x n , x = x 11 - 1 . (11) 

Then there are clearly three cases to consider. 

n = 1: So (jTTJ) =>■ x = 1 = x, and the network is composed of isolated pairs of nodes. 

n > 3: Then x — x — satisfies pip , and all nodes are in the giant component. 

n = 2: This is the critical value for n. Going through a more careful argument shows that long 
chains of nodes are formed, but the expected length of the longest of these grows more slowly 
than the network size N. 

While it is possible to consider site percolation for these networks as a model for vaccination, for 
simplicity let us consider how bond percolation affects the result (JTTJ) to give 

x = ((1 -p b ) +p b x) n , x = ((1 -p b ) +p b x) n ~ 1 . (12) 

These equations do not have a simple analytic solution (although for small n, the polynomials 
involved can be factorised) but are quick to solve numerically. 



4.1.2 Transmission dynamics 

So what is the equivalent to the dynamical SIR model ([T]) for networks? In general, the disease 
state of node i is either S, I or R. We write A, to indicate this: S< = 1 if i is susceptible; Si = 
otherwise; and similar definitions hold for other states. Then we also use a notation where: 

[A] = J2 A i , l AB ] = Yl AiBjGii > i AB °] = 2 AiBjCkGijGjk , (13) 

making use of the adjacency matrix G defined in (|S} above. We assume, as before, that infectious 
individuals recover at a constant rate 7, but now infection does not happen homogeneously at rate 
f3 as in the simple SIR model Instead, susceptible- infectious pairs [SI] become infectious- 
infectious [II] at rate r. It follows that an epidemic on a network obeys the exact, but unclosed, 
system of equations 

[S] = -r[SI] , [SS] = -2t[SSI] , 

[I] = t[SI] - 7 [I] , [SI] = t([SSI] - [ISI] - [SI]) - j[SI] , 

[R] = "f[I], [SR] = -t[ISR]+j[SI] , 

[II] = 2t([ISI] + [SI]) - 2 7 [II] , 

[IR]=t[ISR]+ 1 ([II]-[IR]) , 

[RR] = 7 [IR] . (14) 

One could, of course, keep writing down equations for the triples in terms of higher-order structure, 
but it is better to make assumptions that allow us to close these equations. For an n-regular graph, 
the typical choice is 

[ABC] ,-^imm. (15) 

n [B] 

This follows from assuming that nodes of type A and C are multinomially distributed about nodes of 
type B with probabilities [AB]/(n[.B]) and [C5]/(n[5]) respectively [7]. When first introduced, this 
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was really an assumption of Type III in the language of fJT] above made just to get some purchase 
on the dynamical system. It turns out, however, that this assumption is numerically extremely 
accurate for SIR dynamics, and recent results suggest a formal proof for this observation [5]. 

There are two relevant results that can be obtained from manipulation of the closed dynamical 
system obtained by substitution of (TT5)) into (fH]l [16]. The first of these is the analogue of © 
above. Early in the epidemic, 

I(t) cx e rt , where r = (n - 2)r - 7 . (16) 

Therefore, we recover @ if we hold (3 = nr constant while taking n — > 00. It is also possible to 
manipulate the differential equations in a similar (but much more algebraically complex) manner 
to that used to derive (J4J) , which gives a result for the final proportion of the population suscepible, 
s, as 

a = ( 1 - — + — s^-^A " . (17) 

This is clearly equivalent to (|T2"j). but with the probability of transmission across a network link 
t/(t + 7) taking the place of the bond percolation probability pb- So while the algebra gets more 
complex, for graphs that are correlated through having fixed degree, it is possible to make a link 
between epidemic dynamics and network theory. 



Biological insight 

When there is just one infectious individual on an n-regular graph, then the rate of exponential 
growth of infectious individuals is nr — 7, so why is there a factor of —2 in equation (I16|l ? The first 
thing to note is that every non-initial infectious individual must have been infected by someone, 
and that removes one from its potential pool of susceptibles, explaining half of the —2. The other 
half must therefore arise because early in the epidemic, the average infectious individual has also 
already infected exactly one of its contacts. The practical implication of this is that an epidemic 
on a regular graph will grow more slowly than its transmission and recovery rates would suggest. 



4.2 The configuration model 

In the discussion above, I dodged the question of how to construct a regular graph. The config- 
uration model (CM) |20j provides a way to construct networks with a given degree distribution; 
however first it is worth considering what this might mean. Returning to Figure [3J the configura- 
tion model is designed to construct networks of types (a) and (b) - these have few short, closed 
loops and no particular preference for connections between nodes of similar degree. The method 
for doing this is shown in Figure HI Firstly, each node is given a number of 'stubs', and then these 
are paired up in a random order to give a network. This process can lead to repeated links, and 
links that start and end on the same node, but for most practical purposes these can be ignored. 

In terms of epidemic dynamics on configuration model networks, a recent paper by Ball & 
Neal noted that these could be reconstructed if the network is constructed at the same time 
as the epidemic. The construction behind this is shown in Figure and yields a closed system 
of differential equations that we will call the BN model (a conceptually similar but much lower 
dimensional set of equations was derived in |26| , with recent results showing that this approach is 
also exact [5]). Simpler approaches can therefore be tested against the BN model. Figure [6J shows 
the typical results of doing this - pairwise and related models are numerically indistinguishable 
from the BN predictions. 

There are therefore various ways to derive the following result, which was prefigured in |10j . 
and generalises (flB]) : 

lit) oc e rt , where r = ( mean(fc) + var ( fc ) - 2 ) r - 7 , (18) 

\ mean(fi) J 

where k is the node degree. This shows that an epidemic is only possible, regardless of the rate of 
transmission compared to recovery, if 

. var(fc) „ , , 

mean(& + -77T > 2 , (19) 

mean(fc) 
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Figure 4: Construction of a CM network (a) individuals start with a degree, and are given 
that number of stubs, then (b) stubs are paired at random to form a network. 



(a) 



(b) 
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Figure 5: The Ball & Neal construction. Every individual is initially allocated a 
number of stubs (a) An individual is the index case, (b) Infectious individuals 
make links to other nodes proportionally to the target node's stub number. 
Making a link to a susceptible individual makes that individual infectious. 
Making a link reduces the stub number of both nodes in the link, (c) Individuals 
recover at rate 7, at which point they use up their remaining links at random. 
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Figure 6: Comparison of pairwise and 
exact models on a 3-regular graph. 



which turns out to be the criterion for the existence of a giant component of a CM network. We 
therefore retain the link between network topology (and hence percolation) and epidemic dynamics. 

Biological insight 

An interesting consequence of (|18|) is that regardless of how small the mean node degree is, a 
high variance can preserve a giant component and the ability of a disease to spread. This leads 
to highly connected individuals playing a particularly important role in the spread of disease, and 
there is some debate about whether an '80/20' rule holds for epidemics, with a small fraction of 
the population causing the majority of transmission. Were this to be the case, then interventions 
targeted at the highly connected individuals alone could stop a disease spreading, although it is 
appropriate to be cautious when proposing any new measure to control a disease. 

4.3 Assortativity 

The configuration model captures an important feature of real networks, namely that some indi- 
viduals have more connections than others. What it ignores is the possibility that highly connected 
individuals may have a preference to make links with other highly connected individuals. If we use 
notation like the pairwise model, so that [k] — Nk is the number of nodes of degree fc, and [Im] is 
the number of links between a node of degree I and a node of degree m in the network, then it is 
possible to quantify the preferences of individuals through a symmetric correlation matrix C: 

l[l\m[m\ 

If every Ci :Tn — 1, then the correlations are consistent with the configuration model. If Ci. m > 1 
for similar values of m and C; iTn < 1 for dissimilar values of /, m, then the network is called 
assortative; and a network is disassortative if the opposite relationship between C and the similarity 
of its indices holds. Of course, there is much more information in a matrix than can be encoded 
unambiguously in either a binary choice between assortativity and disassortativity - the ambiguity 
being essentially what one means by 'similar' - but these remain useful concepts for thinking about 
epidemic networks. 

If we have a target C in mind, then Newman |23j suggested a method for producing a network 
with that correlation structure. Starting with a CM network, link swaps are proposed as shown in 
Figure [7J Such a swap is then performed if 

rand < C j L , (21) 

where rand is a random number picked uniformly between and 1. This is a form of Metropolis- 
Hastings sampling, which should converge on a set of networks with appropriate degree correlations 
given a large enough initial network and sufficient computer time. 
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Figure 7: Rewiring / link-swapping move for con- 
struction of assortative heterogeneous networks. 
This move is proposed by picking pairs of links 
at random, and the proposed modification to the 
network is made or not according to the standard 
Metropolis-Hastings rules. 



To model epidemics on such assortative networks, the paper [T^] starts from a generalisation 
of (|T4")) that indexes nodes by their degree: 

[S k ] = -r[S k I} , [SkSt] = -rdSkSa] + [S t S k I]) , 

[Ik] = r[S k I] - j[I k ] , [S k Ii] = r([S k SiI] - [IS k I,]) - [S k I t ] - j[S k I t ] , 

[R k ] = y[h] , [S k Ri] = -T[IS k Ri] + j[S k Ii] , 

[I k Ii\ = T([IS k Ii] + [ISiI k ]) + [S k Ii\ + [Sil k ] - 2 7 [/ fe J/] , 
[hR] = rilSkR] + 7 ([/ fc J,] - [I k Ri]) , 

[RkRi] =j([I k Ri] + [IiR k }) , (22) 

so [A k ] = J2i AiS ki ,k, where 5 is the Kronecker delta, is the number of nodes of degree k in disease 
state A, and similarly for pairs. Omission of a subscript index stands for an implicit sum, e.g. 
[Ski] = ^ m [4I m ] . As before, these equations are exact but unclosed, and the moment closure 
proposed is 

It is possible to manipulate the closed set of equations produced to derive a further generalisation 
of Ea and (TT8i 



7(t)oce rt , where r = (A(M) — 1) r — 7 , (M) lm = 1} , (24) 

m[m\ 

and A(M) is the dominant eigenvalue of matrix M. An equivalence between network theory and 
epidemic dynamics is maintained here: A(M) > 1 is the condition for the existence of a giant 
component. 

Biological insight 

It is not immediately obvious what the dominant eigenvalue of a general matrix looks like. But it 
turns out that in the same way that heterogeneous networks can have low mean degree and still 
support an epidemic, it is possible for assortativity to concentrate sustained transmission between 
highly connected individuals even if the equivalent configuration model network would not sustain 
an epidemic. An example of this is sexually transmitted diseases, where much of the population 
could be in long-term partnerships, forming a set of small components of size 2, while transmission 
is sustained amongst a 'core group'. In this scenario, the targeting of interventions at those with 
many connections may well be suboptimal, since these may do little to reduce prevalence in the core 
group while failing to halt transmission between the core group and individuals on its periphery. 
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5 Clustering 



5.1 Small worlds 

To start with, let us define two network properties in terms of the adjacency matrix G as defined 
in ([2]). First, the clustering coefficient <p is the number of triangles in the network divided by the 
total number (closed and unclosed) triples: 

- ^ Gi ^ kGki e[0) i], (25) 



where So- is the Kronecker delta. Secondly, the shortest path length between two distinct nodes 
dij is the minimum number of links needed to form an unbroken path between them: 

dij =mm{p\(G'% ,=1} . (26) 

The 'small world' effect is essentially that observed networks of contacts often have significant 
values of <p, but low integer values of dij - your contacts are likely to contact each other, and you 
are probably at most six handshakes (or even sneezes) away from the majority of people on Earth. 

At first sight, this creates a paradox, because the easiest clustered networks to visualise are 
lattices, which have large values of dij. Watts and Strogatz [27 showed that this could be overcome, 
through the introduction of a small number of random links to a lattice. While this work is widely 
(and correctly) perceived as solving an important conceptual problem, other networks with the 
small worlds properties of significant 4> and small integer values for dij have recently been proposed 
that are more realistic for modelling epidemics. 



Biological insight 

Historical epidemics like the Black Death spread over years through Europe at walking pace, but 
in the modern age pandemics can cross continents in hours. The low path lengths seen in 'small 
world' networks explain this change. People have kept the same household and local community 
contacts that they had throughout history, but modern transportation means that business and 
leisure travel can happen over previously unimaginable distances. Short path lengths have been 
a feature of all networks considered in this paper so far, but what about the clustering in our 
local contacts? The precise impact of clustering on epidemic dynamics is subtle, and cannot be 
condensed (yet) into a set of straightforward biological insights. We shall consider instead some 
possible routes to gain traction on the problem. 



5.2 Triangles and percolation 

The presence of an appreciable number of triangles in a network forms a mathematical inconve- 
nience for a rather subtle reason. To see why this should be, let us consider two different scenarios 
and two different models. Scenario I is an epidemic started from the middle node of an unclosed 
triple; and Scenario II is an epidemic started from one node of a triangle. In Model 1, each infec- 
tious individual is equally transmissible and has a probability T < 0.5 of transferring infection to 
each of its contacts; and in Model 2, half of infected individuals have zero probability of transmit- 
ting to each contact and half have probability T — 2T of transmitting to each contact. Table [5] 
goes through all of the possibilities for epidemics for these scenarios, and gives us the following 
results for expected final epidemic size depending on scenario and model: 

J& 1 = 1 + 2T , R l £ = 1 + 2T , 

R 1 ^ 1 = 1 + 2T(1 + T-T 2 ) , R 1 ^ 2 = 1 + 2T(1 + T-2T 2 ) . (27) 

Percolation therefore gives the correct epidemic final size for Model 1, or for either model on an 
unclosed triple, but for Model 2 on a triangle we need a process that correlates the presence of 
links with shared nodes meaning that percolation is unsuitable. 
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Scenario 


Epidemic 


2° cases 


Model 1 Prob. 


Model 2 Prob. 


I: A 


• 

A 
/ . 

A 



1 

1 
1 

2 


(1-T) 2 
T(l-T) 
i U - i ) 
T 2 


\{l + {l-T) 2 ) 
\f(l-f) 

2 v J 
1 j.2 


II: A 


• 

A 
/. 

A 
A 
A 



1 
1 
2 
2 
2 


(1-T) 2 
T(l-T) 2 
T(l — T) 2 
T 2 (l - T) 
T 2 {1 - T) 


±(1 + (1-T) 2 ) 

|f(l-f)(2-f) 

±T(1-T)(2-T) 

if 2 (l-f) 

if 2 (l-T) 

lrfi2 

2 1 



Table 2: Table showing epidemic tree probabilities on I: an unclosed triple; and II: a triangle; 
for Model 1: fixed infectious period; and Model 2: bimodal infectious period. In the diagrams, 
a large circle corresponds to the initial infectious individual, a solid line corresponds to a link 
that transmitted infection, the absence of a line corresponds to no transmission of infection, and a 
dotted line means a link whose role in transmission is not specified. 



5.3 Local tree-like structure 

Given that percolation does not work in a straightforward manner on networks with an appreciable 
clustering coefficient, a current research topic is to find special kinds of clustered networks where 
there are short closed loops, but the next level up in the network still looks 'tree-like', meaning 
that if the local, clustered structure is small enough to solve an exact epidemic model on by going 
through a process like that in Table there is a chance of piecing together solved local structures 
and solvable global connection rules. 

Two such approaches are shown in Figure [HI The first of these, shown in (a) is the triangle 
configuration model (2TJ [19] . I n this generalisation of the CM, nodes are assigned a demand for 
triangles in addition to normal links, and random selections of three nodes at a time are made to 
satisfy this demand. The second, shown in (b) is a clique-based model [31 0], in which nodes are 
placed in fully connected local subgraphs called cliques, then given an Configuration Model-like 
demand for global links. Both of these constructions allow analytic results to be obtained, and 
dynamics to be written down. 



5.4 Dynamical clustering 

The pairwise approach outlined in §4.1.21 above lends itself naturally to consideration of epidemic 
dynamics on networks with appreciable clustering coefficient. This is achieved by modification 
of the closure relationship (fT5|) . The traditional clustered closure (sometimes attributed to Kirk- 
wood [T5] and analysed for SIR epidemics in [TBHH]) is 

n n—1 \AB]\BC] ( . s N \CA] \ , N 

Several improvements to this closure have been suggested, including some that are much more 
readily interpretable [141 124) , but no network with significant cf> has yet been found for which any 
given closure is exact. Despite this, the system of equations is often good enough for practical 
purposes [13] ■ It is also possible to use ([2"5)l together with (fl"4")) to derive analytic results; in 
particular, an approximate linear correction to (|16j) is 

J(t) oc e* , where r « r f (n - 2) - 2(n - 1) (2(n - l)(n - 2)r + n 7 ) \ _ + Q 2 

\ n z ([n — 2)t + 7) / v ' 

In contrast to heterogeneity in degree distribution and assortativity above, the inclusion of clus- 
tering reduces the potential of an epidemic to invade a population at all values of transmission 
parameters. 
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(a) (b) 




Figure 8: (a) Construction step for a triangular configuration model net- 
work, showing three nodes each with unmet demand coordinating to make a 
triangle, (b) Construction step for a clique-based network, showing a node 
attached to a 3-clique with three stubs recruiting other stubs from a 4- and 
5-cliquc with one stub each and an isolated node with three stubs. 



I J I J 




Figure 9: 'Big-V rewiring / link-swapping move for con- 
struction of clustered random graphs. This move is always 
accepted if it increases the clustering coefficient. 



While clustered pairwise models are attractive due to their relatively low system dimension 
and number of parameters, the arbitrary nature of closure proposals such as (I28|) is somewhat 
unsatisfactory and it would be nice to have a better understanding of what makes a closure work 
(or not). One observation is that agreement is often best with n- regular graphs that have had 
clustering introduced by the rewiring shown in Figure [91 [6\ 114]. but understanding this observation 
remains an active area of research. 

6 Concluding remarks 

This review has focused on methods for determining the conditions under which an epidemic 
will take off in a population, paying particular attention to the links between network theory and 
epidemiology. This helps to understand the role that heterogeneity, preference and transitivity play 
in shaping the transmission dynamics of human pathogens, and ultimately this understanding can 
lead to the improvement of intervention strategies to control and mitigate the burden of infectious 
disease. 

My focus has been on the statistical physics technique of percolation, and the dynamical systems 
technique of first-order differential equations. A key omission has been the contribution made by 
more mathematical researchers from the field of probability theory, who have been able to derive 
many of the results presented here in full rigour, and also shed light on the deep reasons for the 
link between epidemics and networks, but this is done using techniques that are not familiar to 
physicists and are so beyond the scope of this work. Excellent monographs introducing applied 
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probability approaches to random graphs [TT] and epidemics [I] are, however, available. 

For further reading, reviews that go into more detail on network epidemiology include 0[S]. I 
hope that this review, however, will encourage readers with a background in physical sciences to 
invest the time in reading more about this interesting field of biological research, where techniques 
from physics can make an important contribution. 
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